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Abstract 

Markov chain Monte Carlo (MCMC) algorithms are used to estimate features of interest 
of a distribution. The Monte Carlo error in estimation has an asymptotic normal distribution 
whose multivariate nature has so far been ignored in the MCMC community. We present a 
class of multivariate spectral variance estimators for the asymptotic covariance matrix in the 
Markov chain central limit theorem and provide conditions for strong consistency. We examine 
the finite sample properties of the multivariate spectral variance estimators and its eigenvalues 
in the context of a vector autoregressive process of order 1. 
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1 Introduction 


Markov chain Monte Carlo (MCMC) methods are often required for parameter estimation in the 
statistical models encountered in modern applications. The typical MCMC experiment consists of 
simulating a Markov chain in order to estimate a vector of quantities, such as moments or quantiles, 
associated with the target distribution. However, the multivariate nature of the estimation has only 
rarely been acknowledged in the MCMC literature. We consider the situation where estimation of 
a vector of means is of interest. Given a multivariate Markov chain central limit theorem (CLT) for 
the sample mean vector, we show that a class of multivariate spectral variance estimators (MSVEs) 
are strongly consistent estimators of the covariance matrix in the asymptotic normal distribution. 
We also establish strong consistency of the eigenvalues of any strongly consistent estimator of the 
asymptotic covariance matrix. 

We know of no other comparable work in the context of MCMC. Kosorok (2000) did propose 
estimators of the asymptotic covariance matrix which generalized work in the univariate case by 
Geyer (1992). However, these estimators are asymptotically conservative and are based on the 
properties of reversible Markov chains, an assumption we do not make. There has been a sub¬ 
stantial amount of work in the univariate setting. In particular, Atchade (2011) and Flegal and 
Jones (2010) established strong consistency of certain univariate spectral variance estimators, but 
the multivariate problem is more complicated and requires much new work. Moreover, our work 
represents a substantial generalization of the univariate results and requires much weaker conditions 
on the Markov chain. Thus we also improve the current results in the univariate setting. 

We will give a more formal description of the problem studied here. Let F be a probability 
distribution with support X, equipped with a countably generated cr-field £>(X) and let g : X — > R p 
be an E-integrable function such that 

9 := E F g = [ g(x ) dF 
Jx 

is the p-dimensional vector of interest. Note that X and 8 often have different dimensions. It 
is common to resort to MCMC methods to estimate 8 when it is difficult to obtain 8 analyti¬ 
cally or to produce independent samples from F. MCMC is popular because it is straightforward 
to simulate a Harris ergodic (i.e., aperiodic, E-irreducible, and positive Harris recurrent) Markov 
chain having invariant distribution F (Geyer, 2011; Liu, 2008; Robert and Casella, 2013). Let¬ 
ting X = {X\, X2, X3,...} denote such a Markov chain, estimation is easy since, for any initial 
distribution, with probability 1, 

1 n 

8 n := — Y g(X t ) —>• 8 as n —> 00 . (1-1) 

n f 
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Of course, for any n there will be an unknown Monte Carlo error in estimation, 9 n — 9 , and 
assessment of this Monte Carlo error is critical to the reliability of the simulation results (Flegal 
et al., 2008; Flegal and Jones, 2011; Geyer, 1992; Jones and Hobert, 2001). However, the multivariate 
nature of the Monte Carlo error has been largely ignored in the MCMC literature (but see Gong 
and Flegal, 2015). 

Instead, the primary focus has been on assessing the univariate Monte Carlo error. Let g^\ 9n\ 
and denote the ?'th components of g , 9 n , and 9, respectively. Then 6 $ — 9 ® is the unknown 
Monte Carlo error of the ith component. The approximate sampling distribution of this error is 
available via a Markov chain CLT if there exists 0 < a'f < oo such that, as n — > oo, 

MOW -9 {l) )^N(.0,af) . ( 1 . 2 ) 

(See Jones (2004) and Roberts and Rosenthal (2004) for a discussion of the conditions for (1.2).) 
Due to serial correlation in X, Varpt/*' ^ af, except in trivial cases. Nevertheless, consistent 
estimation of af is key to constructing asymptotically valid confidence intervals for 0W and hence 
in assessing the reliability of the simulation results (Flegal and Gong, 2015; Flegal et al., 2008; Glynn 
and Whitt, 1992; Jones et ah, 2006; Jones and Hobert, 2001). Thus consistent estimation of af has 
received significant attention; Atchade (2011), Damerdji (1991), and Flegal and Jones (2010) studied 
spectral variance estimators, Hobert et ah (2002) and Mykland et ah (1995) investigated estimators 
based on regenerative simulation, and Jones et ah (2006) studied nonoverlapping batch means. 
Geyer (1992) introduced asymptotically conservative estimators based on the spectral properties 
of reversible Markov chains. Doss et ah (2014) considered univariate estimators in the context of 
estimating quantiles. 

In the multivariate setting, the approximate sampling distribution of the Monte Carlo error is 
available via a Markov chain CLT if there exists a positive definite p x p matrix E such that 

y/n(9 n — 9) - 4 - N p (0, E) as n — > oo . (1.3) 

We consider a class of MSVEs of E and provide conditions for strong consistency. Our main 
assumption on the process is the existence of a multivariate strong invariance principle (SIP); that 
is, we assume that the centered and appropriately scaled partial sum process is similar to a Brownian 
motion. Specifically, an SIP holds for {g(Xt)}t> l if there exists a p x p lower triangular matrix L 
and an increasing function on the integers such that, with probability 1, 

n(9 n — 9) = LB(n) + 0(ip(n)) as n — > oo , 

where B(n ) denotes a p-dimensional standard Brownian motion and LL T = E. If i/j is such that 
i/j(n)/y/n -> 0 as n —> oo, the SIP implies a strong law, a CLT, and a functional CLT for 9 n . Under 
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moment conditions on g , an SIP with = n 1 / 2 A for some A > 0 holds for polynomially ergodic 
Markov chains. 

There has been a substantial amount of work in the context of MCMC on establishing that 
Markov chains are at least polynomially ergodic. An incomplete list is given by Acosta et al. 
(2015), Doss and Hobert (2010), Fort and Moulines (2003), Hobert and Geyer (1998), Jarner and 
Hansen (2000), Jarner and Roberts (2002), Jarner and Roberts (2007), Johnson and Geyer (2012), 
Johnson and Jones (2015), Jones et al. (2014), Marchev and Hobert (2004), Petrone et al. (1999), 
Roberts and Rosenthal (1999), Roberts and Tweedie (1996), Rosenthal (1996), Roy and Hobert 
(2007), Tan and Hobert (2012), Tan et al. (2013), and Tierney (1994). While establishing that a 
Markov chain is at least polynomially ergodic can be challenging, it is not the obstacle that it once 
was. 

1.1 Motivating Example 

As motivation for the use of multivariate methods, we present a simple Bayesian logistic regres¬ 
sion model. For i = 1,..., K, let F) be a binary response variable. For the ith observation let 
Xi = (xn,Xi 2 ,.. •, Xtf) be the observed vector of predictors, then 

Yi\X{, (3 Bernoulli > anci P ~ As(0, h) . (1.4) 

The resulting posterior F is intractable and hence MCMC is used to obtain estimates of the regres¬ 
sion coefficient, (3. We use the logit dataset in the mcmc R package which contains four predictors 
and 100 observations. The goal is to estimate the posterior mean of (3 = (/3o, Pi, fa, /3s, (3 a) t . Thus 
g here is the identity function mapping to M 5 . 

To sample from the posterior we use the Polya-Gamma Gibbs sampler of Poison et al. (2013) 
(see the R package BayesLogit) which was shown to be uniformly ergodic by Choi and Hobert 
(2013). Although the chain mixes fairly quickly as seen in the autocorrelation plot for (3q in Figure 
1, the cross-correlation plot between (3q and (32 indicates correlation across these components that is 
ignored by univariate methods. As a result in Figure 2, the multivariate confidence ellipse is oriented 
along non-standard axes (see Vats et al. (2015) for details on how to construct such confidence 
regions). The ellipse is compared to two univariate confidence boxes; the smaller uncorrected for 
multiple testing and the larger corrected for two tests using a Bonferroni correction. 

We assess the performance of these confidence regions by comparing their coverage probabilities 
and volumes over 1000 independent replications for varying Monte Carlo sample sizes. In particular 
we look at the volume to the pth. root (p = 5 in this example). The ‘true’ posterior mean is 
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Autocorrelation for p 0 


Cross-correlation for p 0 and p 2 




Figure 1: Autocorrelation plot for fa and the cross-correlation plot between fa and fa for a Monte 
Carlo sample size of n = 10°. 

determined by obtaining a Monte Carlo estimate from a sample of length 10 9 . Results are presented 
in Table 1. Note that as the Monte Carlo sample size increases, the multivariate methods produce 
confidence regions with the nominal coverage probability of 90% with significantly lower volume 
compared to the Bonferroni corrected regions. The uncorrected regions have far from desirable 
coverage probabilities. 

One reason for the reduction in volume of the ellipsoid is that multivariate methods capture 
information ignored by univariate analysis. This also leads to a better understanding of the effective 
samples obtained in an MCMC sample. Vats et al. (2015) provide the following estimator of effective 
sample size 



where A is the sample covariance matrix for g(Xt), X is a strongly consistent estimator of X, and 
| • | denotes determinant. They demonstrate the superiority of this estimator of effective sample size 
to the univariate estimator of Kass et al. (1998) and Gong and Flegal (2015). 

The rest of the paper is organized as follows. In Section 2 we formally define the MSVE and 
present conditions for strong consistency. We also establish strong consistency of the eigenvalues. 
Section 3 contains a simulation study where we investigate the finite sample properties of the MSVE 
in the context of a vector autoregressive process. Finally, we present a discussion in Section 4. Many 
technical details of the proofs from Section 2 are deferred to the appendices. 
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90% Confidence Region 



0.567 0.568 0.569 0.570 0.571 0.572 0.573 


Figure 2: 90% confidence regions constructed using univariate and multivariate methods. The solid 
ellipse is constructed using an MSVE, the dotted smaller box is constructed using an uncorrected 
univariate spectral variance estimator and the dashed larger box is constructed using a univariate 
spectral variance estimator corrected by Bonferroni. 


2 Spectral Estimators and Results 

2.1 Definition of MSVE 

Let Y t = g(X t ) — 0, t = 1,2,3,... and define the lag s, s > 0, autocovariance matrix as 


7 ( s ) = 7 H T = EF [F f y t T +s ] . 


Define I s as I s = {l,...,(n — s)} for s > 0 and as I s = {(1 — s),...,n} for s < 0. Let 
Y n = n -1 Ylt= l Yt and define the lag s sample autocovariance as 



( 2 . 1 ) 


tela 

The MSVE is a weighted and truncated sum of the lag s sample autocovariances, 

b n 1 

^ w„(s) 7 „(s), 

S=— (b„— 1 ) 

where w n (-) is the lag window and b n is the truncation point. 
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Table 1: Volume to the pth (p = 5) root and coverage probabilities for 90% confidence regions 
constructed using MSVE, uncorrected univariate spectral estimators and Bonferroni corrected uni¬ 
variate spectral estimators. Replications = 1000 and standard errors are indicated in parenthesis. 


n 

MSVE 

Bonferroni corrected 

Uncorrected 

Volume to the 5th root 

le3 

0.0574 (4.93e-05) 

0.0687 (7.02e-05) 

0.0483 (4.93e-05) 

le4 

0.0189 (7.50e-06) 

0.0226 (1.12e-05) 

0.0160 (7.90e-06) 

le5 

0.0061 (1.10e-06) 

0.0073 (1.50e-06) 

0.0051 (1.10e-06) 

Coverage Probabilities 

le3 

0.853 ( 0 . 0112 ) 

0.871 (0.0106) 

0.549 (o.oi57) 

le4 

0.882 ( 0 . 0102 ) 

0.904 (o.oo93) 

0.612 (0.0154) 

le5 

0.895 ( 0 . 0097 ) 

0.910 (o.oo9o) 

0.602 (o.oi55) 


2.2 Strong Consistency of MSVE 
2.2.1 Strong Invariance Principle 


While Markov chains are our primary interest, we only require {Xt}t> l to be a stochastic process 
which satisfies a strong invariance principle or SIP. In the interest of clarity, the SIP was stated 
somewhat loosely in Section 1. What follows is a formal statement of our assumption. 

Recall that F is a distribution having support X, g : X —> R p , and we are interested in estimating 
6 = E pg. We assume g 2 (where the square is element-wise) is an E-integrable function. Set 
h(X t ) = [ g(X t ) — 6] 2 , let || • || denote the Euclidean norm, and let B(t) denote a p-dimensional 
standard Brownian motion. 

We will require an SIP for the partial sums of both g and h. We assume there exists a p X p 
lower triangular matrix L, an increasing function ^ on the integers, a finite random variable D , and 
a sufficiently rich probability space such that, with probability 1, 

n 

^2g(X t ) -n0 - LB{n) 

t =l 

We also assume there exists a finite p -vector Oh, a p x p lower triangular matrix Lh, an increasing 


< Dip(n) 


(2.3) 
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function i/jh on the integers, a finite random variable D} ll and a sufficiently rich probability space 
such that, with probability 1, 

n 

Y K x t) ~ n0 h - L h B(n ) 
t =1 

Remark 1. Strong invariance principles have attracted much research interest and have been shown 
to hold for a wide variety of processes; see Section 4 for some discussion on this point. Results 
from Kuelbs and Philipp (1980) show that for the Markov chains commonly encountered in MCMC 
settings, (2.3) and (2.4) hold with ij)(n) = iph( n ) = n 1//2-A for some A > 0. The correlation of the 
process is measured indirectly by (Philipp and Stout, 1975); a large serial correlation implies A 
is closer to 0 while for less correlated processes A is closer to 1/2. 


< D h ip h (n). 


(2.4) 


2.2.2 Strong Consistency 

In (2.2) we define the MSVE as the weighted and truncated sum of the lag s sample autocovariances. 
We make the following assumptions on the lag window w n (-) and the truncation point b n . 

Condition 1. The lag window w n (-) is an even function defined on Z such that 

(a) |rc n (s)| < 1 for all n and s, 

(b) w n ( 0) = 1 for all n, and 

(c) w n (s) = 0 for all |s| > b n . 

Anderson (1971) gives a list of lag windows that satisfy Condition 1. We will consider some of 
these further in Section 2.2.4. 

The following Conditions 2 and 3 are technical conditions ensuring that b n grows at the right 
rate compared to n. 

Condition 2. Let b n be an integer sequence such that b n -» oo and n/b n —> oo as n —> oo where b n 
and n/b n are non-decreasing. 

Condition 3. Let b n be an integer sequence such that 

(a) there exists a constant c > 1 such that ^2 n {b n /n) c < oo, 

(b) b n n~ l log n —>• 0 as n —> oo, 

(c) 6” 1 logn = 0(1), and 
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(d) n > 2 bu¬ 


ll b n = [n u J, where 0 < v < 1, then Condition 3 is satisfied if n > 2 V(i-C. 

Define 

Ai w n (k) = u> n (k - 1) - w n {k) 

and 

A 2 W n {k) = w n (k - 1) - 2 w n (k) + w n (k + 1) . 

Condition 4. Let b n be an integer sequence, w n be the lag window, and ^(n) and ifhip) be positive 
functions on the integers such that, 

(a) b n n~ 1 Ylk=l k\Aiw n (k)\ —> 0 as n —> oo, 

/K \ 2 

(b) b n ip(n) 2 logn I E \A 2 W n (k)\ J —> 0 as n —> oo, 

bn 

(c) ijj(n) 2 '^2 \A. 2 W n (k)\ —> 0 as n —> oo, 

k =i 

(d) 6“ 1 V’/ l (n) —> 0 as n —> oo, and 

(e) 6 “ 1 'i/’(n) —> 0 as n —> oo. 

Condition 4a connects the truncation point b n to the lag window w n . In Section 2.2.4 we will 
present examples of lag windows that satisfy this condition. The functions ip(n) and 'iph( n ) in 
Conditions 4b, 4c, 4d, and 4e correspond to the functions described in (2.3) and (2.4) and thus 
these four conditions connect the truncation point b n , the lag window w n , and the correlation of 
the process, measured indirectly by and tphin). In Lemma 1 we present sufficient conditions 
for Conditions 4a, 4b, and 4c. 

Theorem 1. Suppose the strong invariance principles (2.3) and (2.4) hold. If Conditions 1, 2, 3, 
and 4 hold, then Eg —> E, with probability 1, as n —> oo. 

Outline of proof. The proof is split into several lemmas; see Appendix A for details. Define for 
l = 0, ..., (n - b n ), Yi(k) = k~ l Jft=i Y i+t and 

-j CL bn bn 

= - E E k 2 A 2 w n (k)[Y l (k ) - Y n ][Yi(k) - Y n ] T . 

n 1=0 k= 1 
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For t = 1 ,,n, define Zt = Yt — Y n . Then, in Lemma 4 we show that T, w , n = £5 — d n , where 


1 I b n I t -1 n 

dn = — l y Ai w n (t) I y ZiZf + y ZlZ'i 

t =1 \ 1=1 l=n—b n +t +1 


T 


,-l 


+ £ 

S=1 


b„-s 


t -1 


y, ^i^ti(s + i) I y (ZiZ^s + Zi +s Zf ) + y (■ ZiZf +s + Zi +s Zf) 

t =1 \ /=1 l=n—b n -\-t-\-l 


(2.5) 


Notice that in (2.5) we use the convention that empty sums are zero. In Lemma 9 we show that 
d n —> 0 as n —> 00 with probability 1. Thus T, w , n — £5 —>• 0, with probability 1, as n —> 00 . In 
Lemma 14, we show that —> £, with probability 1, as n -> 00 , and the result follows. □ 


We use Theorem 1 to give conditions for the strong consistency of £5 when the underlying 
stochastic process is a Harris ergodic Markov chain having invariant distribution F. but first we 
need a couple of definitions. Recall that F has support X and £>(X) is a countably generated a- field. 
For n G N = {1,2,3,...}, let the ra-step Markov kernel associated with X starting at x E X be 
P n (x,dy). Then if A € B(X) and r € {1,2,3,...}, P n (x,A ) = Pr(X r+n € A\X r = x). Let || • ||tv 
denote the total variation norm. The Markov chain is polynomially ergodic of order £ where £ > 0 
if there exists M : X — > M + with EpM < 00 such that 

|| P n (x, •) - F{-)\\tv < M{x)n~t . (2.6) 

Notice that polynomial ergodicity is weaker than geometric or uniform ergodicity; see Meyn and 
Tweedie (2009). 

Remark 2. Polynomial ergodicity is often proved by establishing the following drift condition. For 
a function V : X — > [1, 00 ) there exists d > 0, b < 00 , and 0 < r < 1 such that for x E X 

E[V(X n+1 )\X n = x}~ V(x) < -d[V(x)} T + bl(x G C ), 

where C is a small set. In order to verify that EpM < 00 , it is sufficient to show that EpV < 00 
by Theorem 14.3.7 in Meyn and Tweedie (2009). 

Theorem 2. Suppose F , p’||g'|| 4+ ' 5 < 00 for some 5 > 0. Let X be a polynomially ergodic Markov 
chain of order £ > (1 + e)(l + 2/5) for some e > 0. Then (2.3) and (2.4) hold with 

if(n) = i> h {n) = n 1/2 ~ A 

for some A > 0 that depends on p, e, and 5. If Conditions 1, 2, 3, and 4 hold, then £5 -» £, with 
probability 1, as n —> 00 . 
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Proof. See Appendix A.4. 


d 


Remark 3. We rely on results provided by Kuelbs and Philipp (1980) to establish the existence 
of (2.3) and (2.4) in Theorem 2. However, the precise relationship of A with p, e, and 8 is not 
investigated in Kuelbs and Philipp (1980) and remains an open problem. 

Remark 4. When p = 1, the MSV estimator reduces to the spectral variance estimator (SVE) 
considered by Atchade (2011), Damerdji (1991), and Flegal and Jones (2010). However, our result 
requires weaker conditions. First notice that Flegal and Jones (2010) required weaker conditions 
than Damerdji (1991). Thus we only need to compare Theorem 2 to the results in Atchade (2011) 
and Flegal and Jones (2010), both of whom required the Markov chains to be geometrically er- 
godic and to satisfy a one-step minorization condition. Thus Theorem 2 substantially weakens the 
conditions on the underlying Markov chain, while extending the results to the p > 1 setting. 

2.2.3 Strong Consistency of Eigenvalues 

Having obtained a strongly consistent estimator of S, it is natural to consider the eigenvalues of 
the estimator. 

Theorem 3. Let E be any strongly consistent estimator of E and let Ai > A 2 > ■ • • > X p > 0 be 
the eigenvalues of E. Let Ai,..., A p be the p eigenvalues of E such that X\ > X 2 > • • ■ > X p , then 
Xk —> Xk, with probability 1, as n —> 00 for all 1 < k < p. 

Proof. Let || ■ ||jr denote the Frobenius norm. By Weyl’s inequality (Franklin, 2012), for e > 0, if 
IIE — E||i? < e, then for all 1 < k < p, |Afc — A*,) < e, which gives the desired result. Q 

Remark 5. Theorem 3 immediately implies that under the conditions of either Theorem 1 or The¬ 
orem 2 the sample eigenvalues of the MSVE are consistent for the population eigenvalues. That is, 
Xk —> Xk, with probability 1, as n — > 00 for all 1 < k < p. 

Sample eigenvalues can play an important role in multivariate analyses. For example, the length 
of any axis of the confidence region constructed from E g is determined by the magnitude of the 
relevant estimated sample eigenvalue. Thus the largest eigenvalue is associated with the axis having 
the largest estimated Monte Carlo error. This also suggests that dimension reduction methods could 
be useful in assessing the reliability of the simulation effort. Although this is a potentially interesting 
research direction it is beyond the scope of this paper. 
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2.2.4 Lag Window Conditions 


The following generalization of Lemma 7 in Flegal and Jones (2010) is useful for checking that a 
lag window satisfies the conditions of Theorem 1. 

Lemma 1. Reparameterize w n such that w n is defined on [0,1] and tn n (0) = 1 and w n ( 1) = 0. 
Further assume that w n is twice continuously differentiable and that there exists finite constants D\ 
and D 2 such that \w' n (x)\ < D\ and \w’f(x)\ < D 2 . Then as n —>• 00 , 

1. Condition fa holds if b^n^ 1 —> 0, 

2. Conditions fb and fc holds if b~ l fi(n) 2 logn —> 0. 

Proof. The argument is the same as that of Lemma 7 in Flegal and Jones (2010) and hence is 
omitted. □ 

Remark 6. It is common to use b n = \n v J in which case Conditions 4a, 4b, and 4c hold, if we choose 
0 < v < 1/2 such that n^fiin) 2 logn —> 0 as n —> 00 . 

Remark 7. We now consider some examples of lag windows which satisfy Condition 1 and consider 
whether Conditions 4a, 4b, and 4c hold. 

1. Simple Truncation-. w n {k) = I(\k\ < b n ). Using this window the estimator obtained is 
truncated at b n but weighted identically. In this case, A 2 w n (k) = 0 for k = l,...,b n — 2, 
A 2 w n {b n — 1) = —1 and A 2 w n (b n ) = 1. It is easy to see that Condition 4c is not satisfied. 


2. Blackman-Tukey: w n {k) = [1 — 2a + 2a cos (ir\k\/b n )\ I(\k\ < b n ) where a > 0. This is a gen¬ 
eralization for the Tukey-Hanning window where a = 1/4. For fixed a, the Blackman-Tukey 
window satisfies the conditions of Lemma 1, thus Conditions 4a, 4b, and 4c hold if 6^n _1 —> 0 
and b~ l if{n) 2 log n —>• 0 as n —>• 00 . 

3. Parzen : w n (k ) = [1 — \k\ q /bf] I{\k\ < b n ) for q E Z + . When q = 1 this is the modified Bartlett 
window. It is easy to show that the Parzen window satisfies the conditions for Lemma 1, and 
thus Conditions 4a, 4b, and 4c hold if b^n~ l —* 0 and b~ l if>{n) 2 logn —> 0 as n —> 00 . 


4. Scale-parameter modified Bartlett : w n {k) = [1 — r]\k\/b n ] I(\k\ < b n ) where r] is a positive con¬ 
stant not equal to 1 . Then Ai w n (k) = r]b~ 1 for k = 1 , 2, ..., b n — 1 and Ai w n (b n ) = 1 — rj+rjb^ 1 
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Lag Windows 
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Figure 3: Plot of three lag windows, modified Bartlett (Bartlett), Tukey-Hanning and the scale- 
parameter Bartlett with scale parameter 2 (Scaled-Bartlett). 

so that Condition 4a is satisfied when —> 0 as n —> oo. Also, A 2 W n {k) = 0 for 

k = 1, 2,... , b n — 2, A 21 u n (b n — 1) =77 — 1 and A 2 w n {b n ) = 1 — rj + r/b~ 1 . We conclude that 
Y^k=l \^ 2 W n (k)\ does not converge to 0 and hence Condition 4c is not satisfied. 

Figure 3 provides a graph of the three lag windows we consider in the next section, specifically, 
the modified Bartlett, Tukey-Hanning, and scale-parameter modified Bartlett windows. It is evident 
that the modified Bartlett and Tukey-Hanning windows are similar and the scale-parameter modified 
Bartlett window weighs the lags more severely. 

3 Simulation 

We consider some finite sample properties of the MSVE in the context of a vector autoregressive 
process of order 1 or VAR(l). Let 

y t = $y t - i + e t , (3.1) 

where yt 6 for all t, is a p x p matrix, et N p ( 0, IT), and yo is the zero vector. While this is 
a simple model, it is useful to study since we can control the correlation of the process. 

We assume that the largest eigenvalue of <F, </> max , is less than 1 in absolute value, in which case 
the stationary distribution for the process is F = N p ( 0, V) where vec(V) = (I p 2 — $<8) 4>) _1 uec(VF). 


13 




Here < 8 > denotes Kronecker product and I p 2 is the p 2 x p 2 identity matrix. With some algebra it can 
be shown that the lag s autocovariance matrix for s > 0 is 

7 ( 5 ) = and 7 (— s) = 

Consider estimating E py with y n , the Monte Carlo estimate. Tjpstheim (1990) showed that the 
process is geometrically ergodic as long as 1 0 max | < 1. In fact, the smaller the largest eigenvalue, 
the faster the process mixes. Since F has a moment generating function, a CLT holds with 

OO 

s 

5— — OO 

OO 0 

= ^ _ v 

s =0 s =—00 

00 0 

= j2® s v+ v($ T y-v 

s=0 s=—oo 

= (i-$y 1 v + v(i-$ T y 1 -v. ( 3 . 2 ) 

For this process, we investigate the performance of the class of MSVE in estimating E. We 
set W to be the first order autoregressive covariance matrix with correlation p = 0.5 and present 
simulation results for different settings of $ and p. These settings are presented in Table 2. For 
Settings 1 and 4, 0 max = -2, Settings 2 and 5, 0 max = -6 and Settings 3 and 6 , 0 max = -9. Thus, 
these three pairs of settings yield processes with different mixing rates. 

We compare the performance of three lag windows: modified Bartlett, Tukey-Hanning, and scale- 
parameter modified Bartlett with scale = 2. In Section 2 we showed that the modified Bartlett and 
the Tukey-Hanning windows satisfy the conditions of Theorem 1 while the scale-parameter modified 
Bartlett does not. 

For each setting, we do the following in each of 100 independent replications. We observe 
the process for a Monte Carlo sample size of le5, and calculate the three MSVEs at samples 
{le3, 5e3, le4,5e4, le5} with b n = [n 1 / 3 ]. The error in estimation is determined by calculating the 
average relative difference in Frobenius norm, i.e. ||E — E||/r/||E||jr for each of the three windows 
at all five Monte Carlo sample sizes. 

In Figure 4, we plot the results for all settings for all three lag windows. For Settings 1 and 4, 
all three lag windows perform equally well while for Settings 3 and 6 , the scale parameter modified 
Bartlett window performs poorly. In all settings, the modified Bartlett and the Tukey-Hanning 
windows perform similarly, but the Tukey-Hanning window is slightly better when the chain mixes 
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Table 2: Simulation settings 1 through 6. 


Setting 

P 

Eigenvalues of $ for i = 0,... , p — 1 

1 

10 

Ai 

= .01 + *( 

;. 2 o-.oi)/(p-i) 

2 

10 

Ai 

• <s> 

+ 

O 

II 

;.60-.40)/(p- 1) 

3 

10 

Ai 

= .70 + *( 

;.90 -.70)/(p-l) 

4 

50 

Ai 

= .01 + *( 

;. 2 o-.oi)/(p-i) 

5 

50 

Ai 

II 

O 

+ 

<s>. 

;.60-.40)/(p- 1) 

6 

50 

Ai 

= .70 + *( 

;.90-.70)/(p-l) 


more slowly. The plots also indicate that as </> max increases, a larger Monte Carlo sample size is 
required for a desired error in estimation threshold. This is as expected since we know for higher 
values of 4> max , the process mixes more slowly. 

In Section 2 we presented the proof for the convergence of the eigenvalues of the MSVE in 
Remark 5. To study the finite sample properties of the maximum eigenvalue we observe its behavior 
for the three different lag windows at different Monte Carlo sample sizes over each of 100 independent 
replications. At each replication, we observe the relative error in estimation, | Ai—Ai | /X\. The results 
are presented in Figure 5 and are similar to what was observed for the convergence of the MSVEs. 
For Settings 2, 3, 5 and 6, the scale-parameter modified Bartlett window performs significantly 
worse than the Tukey-Hanning and the modified Bartlett windows. When the chain mixes more 
slowly, the Tukey-Hanning window appears to give slightly better results. 

It is natural to investigate the stability of estimation of the largest eigenvalue. We study this 
empirically for Setting 1 by observing the shape of the distribution of the maximum eigenvalue for 
the estimates of £ obtained through the three lag windows at varying Monte Carlo sample sizes 
over the 100 independent replications. Using (3.2), the true maximum eigenvalue for this setting is 
2.683. In Figure 6, we notice that as the Monte Carlo sample size increases, the shape of the density 
of the largest eigenvalue is increasingly symmetric and centered at this true value. In addition, as 
the Monte Carlo sample size increases, the variance of the largest estimated eigenvalue decreases. 
This is observed for all three lag windows. 
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Setting 1 Setting 4 



Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 

Monte Carlo Sample Size Monte Carlo Sample Size 

Setting 2 Setting 5 



Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 

Monte Carlo Sample Size Monte Carlo Sample Size 

Setting 3 Setting 6 



Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 

Monte Carlo Sample Size Monte Carlo Sample Size 


- Bartlett - Tukey-Hanning Scaled-Bart 

Figure 4: ||Xs — S||ir/||S||i? for the three lag windows at different Monte Carlo sample sizes for all 
six settings averaged over 100 iterations. 
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Setting 1 Setting 4 



Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 

Monte Carlo Sample Size Monte Carlo Sample Size 

Setting 2 Setting 5 



Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 

Monte Carlo Sample Size Monte Carlo Sample Size 

Setting 3 Setting 6 



Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 Oe+OO 2e+04 4e+04 6e+04 8e+04 1e+05 

Monte Carlo Sample Size Monte Carlo Sample Size 


- Bartlett - Tukey-Hanning Scaled-Bart 

Figure 5: |Ai — Ai|/Ai for the three lag windows at different Monte Carlo sample sizes for all six 
settings averaged over 100 iterations. 
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Density for Maximum Eigenvalue 


Density for Maximum Eigenvalue 


Density for Maximum Eigenvalue 





Bartlett 

. Tukey-Hanning 

Scaled-Bart 

- Bartlett 

. Tukey-Hanning 

Scaled-Bart 

- Bartlett 

. Tukey-Hanning 

Scaled-Bart 


(a) n = 10 3 



(b) n = 10 4 



(c) n = 10 5 



Figure 6: Kernel density of the maximum eigenvalue for the three MSVEs over 100 replications and 
increasing Monte Carlo sample sizes for Setting 1. The vertical line indicates the true eigenvalue of 
2.683 calculated using (3.2). 


4 Discussion 


Estimation of the asymptotic covariance matrix in the CLT as in (1.3) has received little attention 
in the MCMC literature thus far. Due to the results of this paper, practitioners are now equipped 
with a class of strongly consistent multivariate spectral variance estimators of E. 

However, multivariate spectral variance estimators are also encountered outside of the MCMC 
context. For example, they are often used for heteroscedastic and autocorrelation consistent (HAC) 
estimation of covariance matrices which, for example, arise in the study of generalized method of 
moments and autoregressive processes with heteroscedastic errors. See Andrews (1991) for motivat¬ 
ing examples. In the context of HAC estimation, De Jong (2000) obtained conditions under which 
the class of MSVEs are strongly consistent. However, these conditions are restrictive in the context 
of MCMC. In particular, his Assumption 2 (De Jong, 2000, page 264) will not be satisfied in many 
typical MCMC applications. Additionally, we require weaker mixing conditions on the underlying 
stochastic process. That is, although Markov chains are the primary focus for us, our results hold 
for much more general stochastic processes as we explain below. 

Our main assumption on the underlying stochastic process are the SIPs as stated in (2.3) and 
(2.4). The existence of an SIP has attracted much research interest. Consider the univariate case. 
For independent and identically distributed (i.i.d) processes, the first result of this kind is due to 
Strassen (1964) who showed ^(n) = \]n log logn. Komlos et al. (1975) found that if Fip\g\ 2+S < oo, 
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then ip( n ) = n 1 / 2_A for A > 0 (often called the KMT bound). Komlos et al. (1975) also showed that 
if g has all moments in a neighborhood of 0, then if(n) = log n. The results of Komlos et al. (1975) 
are the strongest to date in the i.i.d setting. The main reference for a univariate strong invariance 
principle for dependent sequences is Philipp and Stout (1975) who prove bounds similar to that of 
Komlos et al. (1975) for a variety of weakly dependent processes including <fi- mixing, regenerative 
and strongly mixing processes. Also, see Wu (2007) for a univariate strong invariance principle for 
certain classes of dependent processes. 

Many of the univariate SIPs have been extended to the multivariate setting. For independent 
processes, Berkes and Philipp (1979), Einmahl (1989), and Zaitsev (1998) extend the results of 
Komlos et al. (1975). For correlated processes, Eberlein (1986) showed the existence of a strong 
invariance principle for Martingale sequences and Horvath (1984) proved the KMT bound for multi¬ 
variate extended renewal processes. For ^-mixing, strongly mixing, and absolutely regular processes, 
Kuelbs and Philipp (1980) and Dehling and Philipp (1982) extended the Philipp and Stout (1975) 
results to the multivariate case. 
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A Strong Consistency of MSVE 

Before we begin the proof of Theorem 1 we note some useful properties of Brownian motion and 
lag windows which will be used often throughout the proof. 


A.l Brownian Motion 


Recall that {B(t)}t>o denotes a p-dimensional standard Brownian motion and that B® denotes the 
7th component of B(t). 


Lemma 2 (Csorgo and Revesz (1981)). Suppose Condition 2 holds, then for all e > 0 and for 
almost all sample paths, there exists no(e) such that for all n > no and all i = 1,... ,p 


sup sup 
0<t<n—b n 0<s<b n 


B®(t + s) 


B®(t) < (1 + e) 




+ log log n 


1/2 


sup 

0 <s<b n 


B®(n)-B®(n-s) 


< (! + e) 




+ log log n 


1/2 

, and 
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B^ l \n) < (1 + e) y^nloglogn. 


Let L be a lower triangular matrix and set E = LL T . Define C(t) := LB(t ) and if C^\t) is the 
ith component of C(t ), define 

C^(k) := * (C®(1 + k)~ C«(Z)) and C« := ^W(n) ■ 

Since C^\t) N(0, tEjj), where Ejj is the ith diagonal of S, (7® /y/'Eu is a 1-dimensional standard 

Brownian motion. As a consequence, we have the following corollaries of Lemma 2. 


Corollary 1 . Suppose Condition 2 holds, then for all e > 0 and for almost all sample paths there 
exists no(e) such that for all n > no and all i = 1,... ,p 


C^\n) < (1 + e)(2nE i jloglogn) 1 ' /2 , 
where Ejj is the ith diagonal entry o/E. 


(A.l) 


Corollary 2. Suppose Condition 2 holds, then for all e > 0 and for almost all sample paths, there 
exists no(e) such that for all n > no and all i = 1,... ,p 


Cf\k) 


< 


1 

— sup sup 

k 0 <l<n—b n 0<s<b n 


C (i) (l + s) -C (i) (l) 


< ^2(1 + e)(b n Su logn) 1/2 , 


(A.2) 


where Ejj is the ith diagonal entry o/E. 


A.2 Basic Properties of Lag Windows 

Recall that the lag window w n (-) is such that it satisfies Condition 1. We will require the following 
results about the lag window w n (-). 

Lemma 3 (Damerdji (1991)). Under Condition 1, 

bn 

(i) Ai w n (s) = ^2 A 2 W n {k), 

k=s 


(») Y Aiw n (k) = w n (s), and 

k=s +1 
bn 

(Hi) Y A i' w n{k) = 1. 

k= l 
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A.3 Proof of Theorem 1 


77 . bn bn 


Recall that 

-i ii> u n vn 

- E E k 2 A 2 w n (k)[Y l (k) - YnW^k) - Y n 

n 1=0 k =1 

For t = 1,2,..., define Z t = Y t — Y n and 


Y w n — 

n 


b n [ t—1 n 

dn = ^ { E A 1 Mt) E + E 

t 1 V /=! l=n— fe n +t+l 


T 


t-1 


+ E 

S=1 


bn~S 


t-1 


E ( E ( ZlZ l+s + Zl+ s Zf) + E {ZlZf +s + Zi +s Zf) 

l = 1 Z— 77 .— 


t=l 


Notice that in (A.4) we use the convention that empty sums are zero. 
Lemma 4. Under Condition 1, T, w n = £5 — d n . 

Proof. For i, j = 1,... ,p, let Y w/t j denote the (z, j)th entry of Y Wjn . Then, 

-1 Cl bn bn 

= - E E k2 ^w n {k) Y®(k) - y« Yp\k) - yW 

/=0 fc=l 


1 

n 

1 

n 

1 


EE A 2 w n (k) 
1=0 k=l 

n-bn bn 


E E A ^ W n( k ) 


Z=0 fc=l 
71 fen b n 


,t= 1 
k 


V y® _ fcv(') y 0 ‘) _ tyC?) 


,t=i 


p-(®) 


E z £ 


77. 


E E A ^( k ) 


1=0 k =1 


E 4 

.t=l 

k k — 1 k—s 


t=l 


k— 1 /c—s 

E 7 W 7 C?) , 7 W 7 (j) , 7 l?) 7 (») 

Z /+Z Z *+i + 2^ 2^ Z /+t Z Z+t+s + Z^ 2^ Z /+Z Z 2+i+s 

_i=l s=l Z=1 s= 1 £=1 

Notice that in (A. 5), we use the convention that empty sums are zero. We will consider each term 
in (A.5) separately. For the first term, changing the order of summation and then using Lemma 3, 

1 n-bn b„ k 

n 


(A.3) 


(A.4) 


(A.5) 


EEE A 2W n {k)zj^ t zj^ t 

1=0 k =1 t =1 
-i Cl bn bn bn 

^ EEE A 2 w n (k)Z^ t Z^ t 

1=0 t =1 fc=t 

-1 Cl b n bn 

- E E A >”'»(*) z ff« z S 


i=0 t=i 
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K n-b„ 

- E E Z & Z K 

n t =1 z=o 


y;Ai Wn (t) 


t=i 


7n,u(°) “ ~ (E^l^ “I-^ ^t-lZE\ + Z n-b„+t+l Z n-b n +t+l +-b 


n 


ln,ij (0) - E Ai«; n (t) | E Z { i l) z\ j) + E z i ]z \ j) ) by Lemma 3. 

Z=1 l=n— b n -\-t-\-l 


t -1 


(A.6) 


t=i 


For the second term in (A.5) we change the order of summation from l, k, s, t to l, s, k, t to l, s, t, k 
to get 

n—bn b n k— 1 k—s 

(0 7 {j) 


-i un ,J n ± ^ o 

- e eee^^e^ 


Z-Ti-f-S 


1=0 k= 1 s=l t=l 

n—Z>„ fe„ —1 b„ k—s 


EE E ^2 A 2Wn{k)z^ t zi j) 




1=0 5=1 fc=s+l 4=1 

n-b„, b„-l bn-s b n 


E E E E ^*»„(fc)z£z,® 


Z—^ s 


Z=0 5=1 t= 1 fc=£+S 

n—6 n bn-1 bn—s 


EEE Aiw n (,s + t)z\^ t z\ j] 




by Lemma 3 


1=0 s—1 t= 1 
bn — 1 n—&„ s 


1 E E E + *) z «zfiE 


n 


s—1 4=0 4=1 

bn 1 bn 5 TL bn 


E Aiwi n (s + t)Z^Zy } i+s 

1 ,S=1 t=l 1=0 

-| bn 1 bn S TL bn 

EE Aitc n (s + t) E Z l+t Z l+t+s 


n 

S=1 t=l 
bn 1 b n 5 


1=0 


5=1 

1 


t -1 


EE Aitc n (s + f) 

8=1 4=1 
A- 1 

E W n(s)7n,4i(s) 

?n 1 b n S 

EE Aiui n (s + 1) 


7 n,ij(s)-^J2 Z l l)z l+ s 1 


4=1 


n 


E z ? ,z £ 


l=n—bn+t+1 


5=1 4=1 


t -1 


EW.+ E 

l — 1 l—TL — bn +£+l 


by Lemma 3. 


(A.7) 
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Repeating the same steps as in the second term we reduce the third term in (A. 5) to 


1 

n 




t -1 


1=1 


t-1 


EEEE A 2Mk) z l+ t z li 

1=0 k= 1 s =1 t=1 

b n 1 ^ bn 1 b n S 

= E W n(shn,ji(s) -“EE Aiw n (s + 1) 

S =1 S=1 t =1 

b n 1 2 bn 1 b n S 

= J2 Wn{shn,ij(s) --J2J2 Ai w n (s + t) 

s=l n S=1 t= 1 

Using (A.6), (A.7), and (A.8) in (A.5) 


fen- 1 -i 

Etu,t_j = 'ln,ij{0 ) “I - ^ ( Wn{s)'y n ,ij{s) T ^ ( HVi (s)'Jn,ij (s) 

s=—(6n—1) 


E z ! z !+.+ E z r z “. 

l=n— 6 n +t+l 


E z < z h.+ E z ; z ;+. 

l=n— b n -\-t+l 


1=1 


(A.8) 


S=1 


t-1 


E A ^) (E + E ^ (j) 

/ — I 1=71 — bn 1 


t=l 

bn 1 b n s 


t-1 


-EE Aiu;„(s +f) 

S=1 t=l 
fen-1 

^ ^ ln,ij(s)w n {s) dn, 
S=—(6 n —1) 

= ^S,ij d"n,ij • 


E( z ® z ®. + z ff, z , U) ) + E ( z ‘ i>z £ + z S, z ! fa ' ) ! 

/—I 1=71 —677, —|—t —(— 1 


B 

Let t n (s), Es,E w , n and d n be the Brownian motion analogs of (2.1), (2.2), (A.3), and (A.4). 
Specifically, for t = 1 define Brownian motion increments Ut = B(t) — B(t — 1), so that 

Ui,..., U n are r^j N p (0, I p ) where I p is the pxp identity matrix. For 1 = 0,..., n—b n and k = 1,..., b n 
define Bi(k) = k~ l {B{l + k) — B(l)), B n = n~ 1 B(n), and T t = Ut — B n . Then 

ln{s) = l E^t - B n ){U t+s - B n ) T = ^J2 1 (A.9) 

tei s tei a 

fen-1 

E S = E w n(s)ln(s), (A.10) 

S=—(b n ~ 1 ) 

-i LI b n b n 

EE k 2 A 2 w n {k)[B l (k ) - R„][R|(A;) - R n ] r , (A. 11) 

n z=o k= 1 
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4-1 


dn = - {E (E T ‘w? + E t 'E 

/— 71 —677, —|— ^ —|— 1 


bn-l 

+ £ 

S=1 


4=1 
bn 5 


1=1 


l l ~ l 


E A ^-( s +*) (E (r ^+ T i+«2f) + E (^ t E+^+ s 2f) 


t=i 


l=n— b n +t-\-l 


> . (A.12) 


Notice that in (A. 12) we use the convention that empty sums are zero. Our goal is to show 
that S„, n —> I v as n —> 00 with probability 1 in the following way. In Lemma 5 we show that 
Eiu.n = £5 — d n and in Lemma 7 we show that the end term d n —> 0 as n —>• 00 with probability 1. 
Lemma 12 shows that £5 —> I p as n — > 00 with probability 1, and hence S TOi „ —> I p as n — > 00 with 
probability 1 . 

Lemma 5. Under Condition 1, T, w , n = £5 — d n . 

Proof. For i,j = 1,... ,p, let £„,.,j denote the (i,j) th entry of £ l0)n . Then, 


£ w,ij — 


E E A 2^n(fc) B® (A;) - B& (k) - S«) 


n 


L), 


4=0 fc=l 
n—b„. b„ 


n 


££ A 2 w n (k) \B^(k + 0 - 5 W (Z) - + 0 - 5 (j) (0 - 


4=0 fc=l 

71 bn bn 


n 


££ A 2W n (k) 


4=0 k=l 
n-b n b n 


£ Ch - W® £ U M ~ kB n ] 


Lt=l 
fc 


Lt=l 


n 


££ A 2Wn{k) 


4=0 k =1 

n—bn. bn 


-*(£) 


V T (i) 

Z^ 4+4 


n 


E E A ^ W n( k ) 


Z^ 4+4 
.4=1 

k fc—1 fc—s 


.4=1 


fc—1 k—s 

E rnO rp(j) _|_ \ ' \ ' rp(i) rp(j) , \ ' \ ' rp(j) rp(i) 

1 i+t 1 i+t ^ Z^ Z^ -L+ 4 -L+ 4 +S ^ Z^ Z , -L+ 4 -L+ 4 +S 

4=0 fc=l Lt=l s=l 4=1 s=l 4=1 

In (A. 13), we continue to use convention that empty sums are zero. We will look at each of the 
terms in (A. 13) separately. For the first term, changing the order of summation and then using 
Lemma 3, 

-1 71 bn bn k 

^£££ A 2 w n (k)T^ t T^ t 


(A.13) 


n 


1=0 k= 1 1 =1 

71 bn bn bn 


-i • " '-' 77 . '-' 77 . >-'71 

^£££ A 2 w n (k)T^ t T^ t 

1=0 4=1 fc=4 

-1 71 b n bn 

£ E E T M A ^w 


4=0 4=1 
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-i u n ,L u n 

E E r £ r $ 


n 


i=i 


1=0 


b„ ( t—1 n 

= 7.,«(o) - - E E W + E ^ >3 f 

\ /= 1 / — 72— fo-yx —^ 1 


£=1 


For the second term in (A.13) we change the order of summation from l,k,s,t to l,s,k,t 
l, s,t,k and use Lemma 3 to get 


n—bn b n k— 1 k—s 


- s £ss A 2u, «( fc ) T i { -H T i+i+- 

fc=l s=l t =1 
2 bn bn 1 bn k S 

E E E E A ^»« T S T S 

Z=0 5=1 /c=s + l £=1 
2 bn bn 1 bn 5 

E E E E A2»„(fc)r«r« 


1 

n 


Z-|-£-|-s 


Z-|-i-|-S 


Z=0 5=1 t= 1 /c=t+S 
72 b n b n 1 b n 5 


EEE Aiw n (s + t)T®T® 




1=0 s—1 1= 1 
6„-l n— b n b n —s 


Tl 


1 S 2 H A i^(s + t)T/^T/^ +s 

s=l 1=0 1=1 

-i bn 1 bn 5 72 bn 

- Y Y S A i^( s + t ) r £ r /+I+ s 

s=l t=l 1=0 

1 b n 1 b n 5 72 b r i 


EE Aitc n (s + i) E 1-f-t Z~t - s 


3=1 t=l 

bn 1 bn S 


1=0 

n—bn+t 


EE A\w n {s + t) E T i T i 


Z-t-s 


s=1 t= 1 
bn 1 b n 5 


l=t 


EE Aiu; n (s +1) 


3=1 1=1 

&n—1 


7n,*i(s) n ^2 T l )T l+s n 


E tW 7 ^(1) 

1 l+s 

-b n -\-t -\-1 

72—5 

E rp(i) rp(j) 1 \ ' rr-i(i)rp(j) 

± l ± l+S “T l l+S 


1 = 1 


Y W n{s)l n,ij(s) -~YY1 Aiw n (s + t) 


S= 1 


5=1 t=l 


l=n—b n -\~t-\-l 
t—1 n—s 


1=1 


l=n-b n -\-t-\-l 


Repeating the same steps as in the second term we reduce the third term in (A. 13) to 

. n—bn b n k—1 k—s 

- E EEE A »(‘)jfM + , 


n 


1=0 k=L 3=1 1=1 


(A.14) 

then to 


(A.15) 
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b n — 1 


X] W n(s)lnji(s) - ~ ^2 Y1 Ai w n (s + 1) 


s =1 
b n -l 


s=l t =1 

bn 1 b n s 


t-1 


E rnWrpW , \ rriWrpW 

± l ± l + S "T Z_/ 2 l+S 

1=1 l=n—b n +t +1 


y, ^n(-s)7n,ij(-s) - - y y AiU; n (s4-t) 


t-1 


i=l 


s=l s=l t=l 

Using (A. 14), (A. 15), and (A. 16) in (A. 13), we get 

bn 1 -1 

^- l w,ij —T n,ij(0) 4“ ^ ^ 'Wn(s)^/n,ij (®) 4“ ^ ^ Hln(s)'Yn,ij(s) 

s=—(b„— 1) 


E rpWrpW I rp{])rp{l) 

± l ± l + S 1" l l+S 

l = Tl — bn —|—it —(~ 1 


S=1 


t-1 


n 


t=l 

bn 1 b n s 


E A i”’»(*)| Et w r,“ + E T /‘ >T “ 

/=1 l=n— 6 n +t+l 

t-1 


(A.16) 


--EE At^n^ + i) 

n 3 = 1 t=l 
bn-l 

^ ^ fn,ij (s)Wn(s) d n ^j 
s=-(b n -l) 


DWi+ffit E (U'Z’.+t'iA®' 

Z=1 l = Tl—bn +t+l 


— ^S,ij d n 


IJ- 


a 


Next, we show that as n -» cx), d n —»• 0 with probability 1 implying E Win — Es —»• 0 with 
probability 1 as n —>• 00 . To do so we require a strong invariance principle for independent and 
identically distributed random variables. 

Theorem 4 (Kornlos et al. (1975)). Let B(n ) be a 1-dimensional standard Brownian motion. If 
X\, X 2 , X 3 ... are independent and identically distributed univariate random variables with mean g, 
and standard deviation a, such that E [e^ 1 !] <00 in a neighborhood of t = 0, then as n —> 00 

n 

y Xi — rip, — aB{n) = O(logn) . 

2—1 


We begin with a technical lemma that will be used in a couple of places in the rest of the proof. 
Lemma 6. Let Conditions 1 and 2 hold. If, as n -+ 00 , b n n -1 Y^k =1 A;| Aiu; n (fc)| —> 0, then 


bn 1 bn S 


n 


y iAiw n (t)i +2 y y |Ai«; n (s +1 ) | 


0 


a=i 


3=1 t=l 
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Proof. 


Un 1 b n S 


n 


^\A lWn (t)\ + 2 E |Ai«; n (s + i)| 


<t =1 


S=1 t= 1 

bn 1 b n 


un 

Tl 


n 


n 

bn 

n 


^|A 1% (t)|+2^ |Aiw n (A:)| 

S=1 fc=s+l 


U=1 


b n k -1 


| Aiu; n (fe)| 


a=i 

/ 


k=2 s =1 
bn 


Y, | Ano„(t)| + 2 ^(A: - 1)1 Aiu; n (fc)| 


,t=i 
/ bn 


k =2 
bn 




y | Aiw n (i)| + 2 ^(A; - 1)1 Aiiu n (A;)| 


a=i 

/ b n 


k =1 


< — 
n 


2yk\Aiw n {k)\ J 

v fc=i / 

0 by assumption. 


Lemma 7. Let Conditions 1 and 2 hold and let n > 2b n . If b n n 1 Ylh=i Aim n (fe)| 
6" 1 logn = 0(1) as n —>• oo, t/ien d n —>• 0 with probability 1 as n —>• oo. 


Proof. For i, j = 1,... ,p, we will show that as re —» oo with probability 1, d n ^j —>• 0. Recall 


t—l 


t=i 


—l s 


u n J- u n 


A EIE+ E r i“ >r “ 

Z=1 l=n—b n -\-t+l 

t -1 


+ sEE Aim n (s + i) 


S=1 t=i 


E( T /° T S+W) + E ( r i Wr i+l+ 

/—I l = n—bn +£+1 


( 0 . 


t -1 


iL«i< A Ei A i““<*)i (E| r /‘ )r ® 


t=i 


i=i 


+ E 

l=n—b n +t+1 


rj-y{^)rp{j) 


bn 1 b n s 

+ EE IAim n (s + t)\ x 

n .5=1 t= 1 



t-1 




n—s 




X 

E( 

rp{^)rr\{j) 

1 l 1 l-\-s 

+ 

rpO rp(j) 

1 l+s 1 l 

) + E ( 

rn{i) rr\(j) 

1 l 1 l-\-s 

+ 

rpO rp(j) 

1 l+s 1 l 



i=i 




l —77 .—bn +£+l 





□ 

0 and 


f) 


(A. 17) 
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where we use the convention that empty sums are zero. Using the inequality \ab\ < (a 2 + b 2 )/2 in 
the first and second terms in (A. 17), we have for t = 1,..., b n 


E 

6— 71 — bfi —|— i —|— 1 


E \ T i" T i 01 

<ig( r (. ,2 +T « 2 ) 

, 2 b„ 

<)EU 12 

, 2b n 

+ )E^' )2 


Z=1 

6=1 

Z=1 

Z=1 


0+0) < ^ 
z z - 2 

71 

E {t!' )2 +t^] 

IA 

tO 1 H-* 

7-0)2 . 1 
z + 2 

71 

E r 0 2 


6=71—677+6+1 


l=n-2b n -\-l 


l=n-2b n -\-l 


Similarly, for the third and fourth terms in (A.17), for t = 1,... b n — 1 and s = 1,..., b n — 1 

E \ T ‘ i)T iil + E h+w 


-i 2 b n 2b n b n -« b n 

< o E U’ 2 + \ E A" 2 +) E r “. 2 + o E tf’ 2 


Z=1 


1=1 


1=1 


1=1 


1=1 


2 ^ l+s 
i=i 


-i 2 b n 2b n 2 b n , 2 b n 

< - E U’ 2 + ‘ E T ! i>2 +' E T ! i}2 + 5 E U 12 

i=i 


2 ' 4 2 
z=i z=i 

26„ 2b„ 

2 . 


Z=1 


Z=1 


Z=1 


E 


rpty) rpU) 

+ E 


7~>W 7^0) 
^ Z+s^ Z 


l=7L— 677+6+! 


6=71—677+6+! 



1 

< 

— 2 

71 

E 

(rj-,(i)2 

+ T/ i)2 ) + 

1 

2 

71 

E u£ 2 + r® 2 ) 

1=7 

1—2677+1 


1 = 

=n-6 n +l 


1 

< - 
“ 2 

71 

E 

^ji(i)2 

+ r/ j)2 ) + 

1 

2 

71 

( T i j)2 + T l i)2 


l=n— 2b n +l 


= E *T + E r; 

l=n— 26 n +l l=7i— 26 n +l 

Combining the above results in (A. 17) we get, 


1 / 1 26 n -i 2b n -i 

|4«|<M)EU )2 + eu )2 +) e u ,2 + 


l=7i— 26 n +l 
0)2 


(02 


n \ 2 


&n-l 

+ i E 

n 


Z=1 Z=1 

2 2t>„ 2 b 

-0)2 , \ rp(j)2 


S=1 


l=7i— 26 n +l 

71 


-i n \ bn 

2 E r “ 2 E 

l=7i— 26 n +l / 6=1 


6 , 7 , s 


E T i + E T i + E *r + E eia^+oi 

6 = 1 6=1 1=71— 26 n + l 1=71 — 2bn + 1 / 6=1 


1 ( -| 2677, -« 26 n, 71 - 71 \ 

f Ef + E U+ E 

n \ 2=1 Z=1 Z=n-2Z> n + l Z=n-2Z> n + l / 
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b n ~ 1 K—s 


u n 

n 


Y / \^l w n{t )\+2 J 2 Y |Ai w n (s + 4)| . 


(A.18) 


,f=l 


•S=l t= 1 


We will show that the first term in the product in (A.18) remains bounded with probability 1 
as n —> oo. Consider, 

2 b n 


, 26 , 

_L v T (i)2 - — V (u (i) - rW 

0h2^ ± l -Oh_2^[ U l 


1 


26„ 


Z=1 


26, 

1 

26) 


z=i 

26 „ 


Z =1 




2 b„ 


+ 


Z=1 


< 


< 


, 26 n 
EC 12 


26 


z=i 


+ 2 




, 26 n 

EC 


26 


z=i 


+ 




t 26 n 

L EC 2 


26, 


z=i 


+ 


26 


, 26 n 

l ec 


z=i 


— (1 + e)( 2 nloglogn ) 1 ^ 2 
n 


+ — (1 + e)( 2 nloglogn, . 

U ) 


< 


26 n 

EC 2 


26 


i=i 


+ 


26 


\ 2 

) 1//2 ) by Lemma 2 

0 ((n _1 logn) 1//2 ) + 0 (n _1 logn) 


, 2b n 

1 EC 


i=i 

r(i) iid 


Since uj >2> are Brownian motion increments, U^ l> 1V(0,1) and by the classical strong law of large 
numbers, the above remains bounded with probability 1. Similarly (26 n ) _1 ^ 2 )(" T ^ 2 remains 
bounded with probability 1 as n —> oo. Next, consider R n = ^i ' 2 ■ Since uj^ ~ iV(0,1), 

Rn ~ Xn ■ Thus R n has a moment generating function and an application of Theorem 4 implies 
there exists a finite random variable Cr such that, for sufficiently large n, 


Rn-n-2B®{ri) 


< C R log n . 


(A.19) 


Consider 

\Rn — Rn—2b n \ 


( C - n - 2B«(n)) - (R n - 2 b n - (n - 26 n ) - 2B«(n - 26 n )) 
- (n - 2b n ) + n + 2£ w (n) - 2£ w (n - 26 n ) 

(i?n - n - 2B«( n )j | + | (i? n _ 26n - (n - 26 n ) - 2 B®(n - 26 n )) 

26 n + 2 B w (n) - 2B w (n - 26 n ) 


+ 


< C R log n + C R log (n - b n ) + 2 b n 
+ 2(1 + e) ^2(26 n ) (log ^ + log log n)) by (A.19) and 


Lemma 2 
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< 2Cr log n + 2 b n + 4(1 + e)(2 b n log n) 1/2 . 


(A.20) 


Finally, 


1 

2 K 


£ 

l=n—2b n +l 


(02 


1 

2 6 n 

1 

26 n 

1 


£ K-sw 


l=n— 26 n +l 
n 

£ 

l=n— 26 n +l 


(02 


2R 


(0 


26„ 


£ O’ + (a 


?(0 


l=n— 26 n +l 


— 2 ^ Rn-2b n 


- ffiW(n) - S«(n - 26 n )) + f-B«(n) 

n 2 o„ V / \n 


< o; l-^n R“n—2bn I + 
2o„ n 


B®( 


n, 


2b r , 


B®(n)-B®(n-2b n 


+ 1 — 
n 


R w ( 


n 


< log n + 26 n + 4(1 + e)(26 n log n 

+ ^(l + e)(2nloglogn) 1/2 ^-(l + e) ^2(26 n ) ^log + log log 
+ (1 + e)(2nloglogn) 1//2 ^ by (A.20) and 


1/2 


Lemma 2 


< Cij6 n 1 log n + 1 + 4 ( 1 + e)(2fe n logn) - f -j-(l + e) 2 (2nlogn) 1/2 (86 n logn) 1/2 

2On nbn 


+ ^-±^(2 log n) 


n 


< Crib 1 log n + 1 + 2(1 + e)(26 1 logn) 1 / 2 + 4(1 + e) 2 (6 n 1 log n) 1/<2 + 2(1 + e) 2 

\nj n 

Since 6" 1 logn = 0(1) asn-l oo, the above term remains bounded with probability 1 as n —>■ oo. 
Similarly, (26 n ) _1 Y^i= n - 2 b +i T^ 2 remains bounded with probability 1 as n —> oo. The second 
term in the product in (A. 18) converges to 0 by Lemma 6 and hence d n , t j —>• 0 with probability 1 
as n —> oo. □ 

Recall that h(Xt) = Yff for t = 1, 2, 3,..., where the square is element-wise. 

Lemma 8. Let a strong invariance principle for h hold as in (2.4). If Condition 2 holds, b~ l ifh[p) —>• 
and bif 1 logn = 0(1) as n —»• oo, then 

-.b n i n 

— ^2 h(X k ) and — ^ h(X k ), 


jlogn 


k =i 


k=n-b n -\-l 
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stay bounded with probability 1 as n —> oo. 


Proof. Equation (2.4) implies that 6” 1 Y^k=i h(Xk) E F h if b~ lr ^h{bn) —> 0 as n —> oo. Since by 
assumption — > 0 as n — > oo and -i/’h is increasing, b~ l Y^k =i h(Xk) remains bounded w.p. 

1 as n —> oo. Next, for all e > 0 and sufficiently large n(e), 


bn 

1 

bn 

1 


E h ^) 

k=n—b n +1 
n n—bn 

Y^KXk)- E 


fc=l 


fc=l 


E — nEph + (n - b„.)Eph + b n Eph - L h B(n) + L h B(n - b n ) 


k =1 


n—bn 


+ L h (B(n) - B{n - b n )) - E 


fc=i 


< — 
bn. 


E M-Xfc) - raE^p/i - L h B(n) 


k =1 


+ 

bn. 


n—b n 


E - (n - b n )E F h - L h B(n - b n ) 


k =1 


+ — ||X ft (S(n) - B(n - b n )) + & n E,ph|| 
bn 

<^-D h ip h (n) + ^-D h ^ h (n - 6 n ) + ||L ft (5(n) - B{n - b n ))\\ + \\E F h\\ by (2.4) 

®n On On, 


< — 
~h 


< — 
~b 


^-D h ip h (n) + ^-D h ip h (n - b n ) + ^~\\L h \\ ( E | B^\n) - 5 w (n - b n ) 

n n n \i=1 / 

( p \ L/2 

E SU P B^ l \n) - B (l) {n - s) ] + \\E F h\\ 

0<s<b n ) 


+ IMI 

1/2 


4/ 2 


<^Dhiph( n ) + ^II (1 + e) ^26 n ^log y + log log n 
< Pf^II + j~D h if h (n) + 0{(b~ l logn) 1/2 ). 

On. 


1/2 


+ ||Ep , /i|| by Lemma 2 


Thus by the assumptions ||^fc= n _b +1 h(Xfc)|| stays bounded w.p. 1 as n —> oo. 


□ 


Lemma 9. Suppose the strong invariance principles (2.3) and (2.4) hold. In addition, suppose 
Conditions 1 and 2 hold and n > 2 b n , b n n~ l YlkLiXiw n (k)\ — > 0, b~ l if{n) —> 0, 6“ 1 i/// l (n) —> 0 
as n —>• oo and bf 1 logn = 0(1). Then, d n —> 0 with probability 1 as n —> oo. 


Proof. For i, j = 1 ,,p, let d n> ij denote the (i, j)th element of the matrix d n . We can follow the 
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same steps as in Lemma 7 to obtain 


1 / -| 2 b n -i 2b n 

+ E z i“ 12 + s E z i 


(02 


, 0)2 


br,. I 2 


Z=1 


n 


i=i 

b n 1 bn S 


l=n—2bn~\-l 


l=n-2b n -\-l 


J2\ AiWn ^\ +2 Yl E |Ai w n (s + t)\ J . 

\t= 1 S=1 t= 1 / 

The second term in the product converges to 0 by Lemma 6 . It remains to show that the following 
remains bounded with probability 1 as n —> oo, 


1 2 b n 2 b n 

1 ' 1 7(02 _L 1 70)2 1 

b n \ 2 


yzpyyzpy y Z P + ; E z ) 


1 


, 0)2 


1=1 


We have, 
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1 2 fen 

L_ \ " 7 (02 


;=i 


2 b n 

E (*f 


l=n— 26 n +l 


l=n— 26 n +l 


26 r 




2 bn 


^y/ i)2 -2y«F« + (yW) . 

Z=1 {=1 1=1 

By the strong invariance principle for g, —> 0, —> 0, and (Y^) 2 -> 0 w.p. 1 as n -> 00 . By 

Lemma 8 , ( 2b n )~ 1 Yllti Y^ 2 remains bounded w.p. 1 as n —> 00 . Thus ( 2b n )~ 1 Ylftl Z^ 2 remains 
bounded w.p. 1 as n —> 00 . Similarly (2 b n )~ l Ylf=\ zj 3> ' 2 stay bounded w.p. 1 as n —> 00 . Now 
consider 
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2 V r 


E Z , 1 

l=n—2b n -\-l 


(02 _ 


1 n 

k. E ( 


y(0 _ 
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l=n—2b n +l 
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1 n 1 n 

_ v y« 2 _2yW— V y 


(0 


+ 


(A.21) 


l=n— 26 n +l 


l=n— 2& n +l 

We will first show that (26 ri ,) _1 S" =n _ 2 b n +i remains bounded with probability 1. Let Y tl denote 
the ith diagonal entry of E, then 


2 br 


E 15 

l=n—2b n -\-l 

1 
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n—2b n N 

EE } - E 


2 b n 
1 

26^ 


v. z=i 
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Z =1 
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2b n 


( n—2b n 


E;^ (i) -V^isW(n) 

Z=1 / 

+ (s W (n) - S«(n - 26 n )) 


E ^ (i) -^S (i) (n-26 ri 
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< 


2 b„ 


n n—2b„ 

J2 Y i (i) - V^iB {i) (n) +^- J2 Y ^ ~ - 2 b n ) 

i=i n i=i 

V(n) - B®(n - 2bn)) 



+ w V Y u - BW(n - 2b n )) 

< ^—Diffn) + - 2b n ) + sup \B^(n) - B^\n ~ s)\ by (2.3) 

^o n Zu n ZO n 0<s<2b n 

9D , - 1 r / n. M 1 / 2 


< w,' Hn>+v% ‘i (1+e) 


2(2 b n ) (log + log log n ) 

zo n |_ \ / _ 

< Db^ipin) + V^(l + e)(26“ 1 logn) 1/2 

= 0(6“V(n)) + log n) 1/2 ) . 


by Lemma (2) 


^ / \ u n r Vv/ K ^ / \\ u n iU o ,fj ) ) • 

By the strong invariance principle for g , L,, 1 '* —> 0 and (Yn^) 2 —> 0 w.p. 1 as n —> oo. By Lemma 
(2 bn)' 1 YZ=n-2b n+ 1 y/ t)2 remains bounded w.p. 1 as n —> oc 
(2 bn)- 1 £lLn- 2fe „ + l remains bounded w.p. 1 as n 

remains bounded w.d. 1 as n —> oo. 


, . 0 w.p. 1 as .. . _ 

oo. Combining these results in (A.21), 

r-< . ^ , \ 1 *— 


Lemma 10. (Billingsley, 2008) For a family of random variables {X n : n> 1}, if E(\X n \) < s n 
where s n is a sequence such that Yl^=i s n < oo, then X n —> 0 w.p. 1 as n —> oo. 

Lemma 11. (Whittle, 1960) Let R\,... ,R n be i.i.d standard normal variables and 
A = £r=i ^2k=l a ik B i B k where aik are real coefficients, then for c > 1 and for some constant K c , 
we have 

E[\A-EA\ 2c ]<K c (j2T, a ik 

V l k 

Lemma 12. Let Conditions 1 and 2 hold and assume that 



(a) there exists a constant c > 1 such that fF n (b n /n) c < oo, 

(b) b n n log n —> 0 as n —>• oo, 


then -» I p w.p. 1 as n -> oo. 

Proof. Under the same conditions, Theorem 4.1 in Damerdji (1991) shows Eg** —>• 1 as n —> oo w.p. 
1. It is left to show that for all z, j = 1,... ,p, and i / j, E s,ij —> 0 w.p. 1 as n —> oo. Recall that 

Y S,ij 

bn~ 1 

= ^ W n (s)% t ij(s) 

S=“(6n —1) 
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+ sWsCO - - s) + B®B& - -B®B&{n - s) + -B^B^\s) + -B^B^{s) 

n n n n 


bn~ 1 


r Yn,ij(fi) E w n (s) 


3=1 


1 £ ^ E - 2 ( X + n) B " B P 


(*) rrO) 


n 


t=i 


t=l+s 


+ -#(BW(n) - £ (i) (n - s)) + -BW(#(n) - - s)) + -B®B®(s) 

nun 


+ -B®(s)B$ 

n 

We will show that each of the terms goes to 0 with probability 1 as n —> oo. 


(A.22) 


1 . 


7»,«(0) = -E T « T « 


t=l 

n 


^(c/W-sW) (uP-bP) 

n t= 1 

i £ [,(%«) _ fiO)i £ £,,«> - B«i £ t/« + B«B« (A.23) 


i=l 


t=l 


t=l 


We will show that each of the terms in (A.23) goes to 0 with probability 1, as n —>• oo. First, we 
will use Lemma 11 to show that n~ 1 —> 0 with probability 1 as n —> oo. Define 


Ri = U?\ R 2 = U?\...,Rn = U%\ R n+ 1 = U\ j \. ..,R 2n = 

Thus, {Ri : 1 < i < 2n} is an i.i.d sequence of normally distributed random variables. Define for 
1 < l,k,< 2 n, 

{ —, if 1 < l < n and k = l + n 
n 

0 otherwise . 


Then, 


2 n 2 n 


A := E E a ik R i R k = E • 


i=i k=i t =i 

By Lemma 11, for all c > 1 there exists K c such that 


E[\A-EA\ 2c ]<K c 2E 


a 2 ik 


l k 
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Since i / j. E[A] = 0, 


E 


;iW 


t= 1 


2 c' 


2n 2n 




a fk 


1=1 k= 1 




Note that J2%°=o n c < oo for all c > 1, hence by Lemma 10, n 
probability 1 as n —> oo. Next in (A.23), 


0 with 


s K^ u ‘‘^u sU) <"> 


t=i 
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E»i 


(0 


t=i 


1 


< -(1 + e) J2n log log n 
n 


' ' —(®) 
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t= l 

1/2 n 


by Lemma 2 


<^( 1+ «)(!5i5r 

V ri J n ^ 


By the classical SLLN 


Similarly, 


Finally, 


^ ”(*) 


n 




t=i 


0 w.p. 1 as n —> oo. 


i n 


w.p. 1 as n —>• oo . 


i=i 


B®B® < \ B®(n) B {j \n) 
n z 

< —=■ (1 + e) 2 (2nloglogn) by Lemma 2 


n' 


< 2(1 + e) 


2 / log" 

n 


0 as n —> oo 


Thus, 7n,ij(0) —>• 0 with probability 1 as n -l oo. 


2. Now consider the term w n(s)n 1 Define 


Ri = U?\ R 2 = U?\...,R n = U®, R {n+1) = U[ j \ ...,R 2n = U® . 
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Thus, {Ri : 1 < i < 2n} is an i.i.d sequence of normally distributed random variables. Next, 
define for 1 < l, k < 2 n 


aik 


w n (k - (n + l )), 


if 1 < l < n 


otherwise. 


1, n + 2 < k < 2 n, and 1 < k 


(n + l) < b n - 1 


Then, 

2 n 2 n 

A := EE aikRiRk 

1=1 k =1 

n— 1 2n ^ 
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3=1 1=1 U 
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s=l 1=1 


Using Lemma 11, for c > 1 and some constant K c 


E 


fbn-l 


2 c 


E^+E u ; u < 


(*) 7tD 
t+s 


. S=1 


t=l 


- ^ (EE EE a ^-) ’ 
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Thus, by Assumption (a) and Lemma 10, 
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w n (s) — uj. l> uj-^ s -+ 0 w.p. I as n —> oo. 


S = 1 


t=l 


37 




3. By letting t — s = l 


b n -1 


i y ’ n ^ i 

J2 W n(s)~ J2 U i l) Ui J J s =J2 Wn ^-J2 U i+s U l 


(*) rrO) 


S=1 


l=l+s 


5—1 


Z=1 


This is similar to the previous part with just the i and j components interchanged. A similar 
argument will lead to ^sLi 1 w n{s)n~ l Y^t=\+s 0 with probability 1 as n —> oo. 


4. 


bn~ 1 


s =1 


< 



Y 2 w n (s) ^1 + 

hi - 1 

E 2lO n (s) (l + 

S=1 

< 5^2K( S )|(l + ^)|sW||^) 


s =1 
2 

< -2 
n z 


?n — 1 

^ (l + #(n) £ 0) (n) 


5=1 


—1 


< ^2 (1 + e) 2 2nloglogn ^ ^1 + 


S=1 


i-i 


<4(l + e) 2 n 1 log to Y y 2 


5=1 


<8(l + e) 2 6 n n 1 logn 
-> 0. 


since |tc n (s)| < 1 
by Lemma 2 


5. Next 


l)n 1 1 _ . / . \ 
w n (s)—B— B^\n — s)j 


5=1 


< 


bn — 1 


Y (bW(b) - B (i) (n - s)) 


S=1 
n -1 


V B^\n) B^\n) — B^(n — s) since |iu n (s)| < 1 
n l 


5=1 

1 

< 2 


6n —1 


B®(n)| V sup |B w (n)-B w ( n — m) | 


^ 0<m<6- 
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c ^2 ((1+ e)(2nloglogn) 1/2 ^ (1 + e) ^2 b n 

< 2 1/2 (1 + e) 2j ^(nlogn) 1/2 (46 n logn) 1/2 6 n 

n z 

fh X 1 / 2 

< 2 3 / 2 (1 + e) 2 J n~ l b n log n 
-)• 0 . 


n 


+ log log n 


1/2 &n-i 

Ei 


5=1 


by Lemma 2 


6. Similar to the previous term, but exchanging the i and j indices, 


b n —l 


E w n (s)^BW (bV\ n) - B^\n - s)) 


S=1 


—> 0 with probability 1 as n —> oo. 


7. 




71 

5=1 


< 


bn 1 -1 

s=l 


bn~ 1 


< E kooi* b u \s) 


s= 1 
1 

< -2 
n z 


B*>(n 


n 

bn~ 1 

E 

S=1 


s) since |iu ri (s) | < 1 


bn 1 


< -o 
n z 


(1 + e)(2nloglogn) 1 ^ 2 E. sup by Lemma 2 


S _1 1 <m<b n 


i-l 


< —^-(1 + e)(2nloglogn) 1 ^ 2 sup \B^\m + 0) — 5^ (0)1 E 1 

n 1 l<m<b~ - 


s=l 


< -|(1 + e)(2nloglogn) 1//2 sup sup | B®{t + m) — 

71 0<t<n—b n 0 <m<b n 

< ^(1 + e)(2nloglogn) 1//2 (l + e) ^ 2 b n ^log + log log ri\ ^ 

h 1/2 

= 2 3/2 (l + ef^j^b n n~ l log n 
n l / z 


0. 
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8. Similar to the previous term, by exchanging the i and j index, 


b n —l y 

Y wJs)-B^B^(s) -> 0 w.p. 1 as n —> oo. 
n 

S=1 

Since each term in (A.22) goes to 0, we get that 

'Ps.ij —>• 0 w.p. 1 as n -Mx>. 


■SI 

Lemma 13. Let Conditions 1 and 2 hold. In addition, suppose there exists a constant c > 1 such 
that ^ ~2 n (b n /n) c < oo, n > 2 b n , b n n~ L logn —> 0, and b n n~ l Yk=i k\Aiw n (k)\ —» 0, then Ti WtU —> I p 
w.p. 1 as n —>• oo where I p is the p x p identity matrix. 

Proof. The result follows from Lemmas 5, 7 and 12. |§| 


The following corollary is an immediate consequence of the previous lemma. 

Corollary 3. Under the conditions of Lemma 13, LYh WtTl L T —> LL T = E w.p. 1 as n —> oo. 
Lemma 14. Suppose (2.3) holds and Conditions 1 and 2 hold. If as n oo, 

( b n \ 2 

Y |A 2 w n (fc)| —» 0 and 


i>{n ) 2 E \A 2 w n (k)\ ->• 0, 


(A.24) 
(A.25) 


k =i 


then T, WjTI -> E w.p. 1. 

Proof. For i, j = 1 ,... ,p, let E ij and E Wj ij denote the (i,j) th element of E and Ti w ,n respectively. 
Recall 

^ h h 

1 


= - E E k 2 A 2 W n (k) Y®(k) - W Yi {j \k) - Y® 

1=0 k=l 


We have 


| Ejj | 


° w n w n 

E E fc2A 2 Y®{k) - y n w 


«=0 fc=l 


- E 


*3 
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1 

n 

- E 


° w n w n 

EE k 2 A 2 Wn {k) lY^ik) - y n w ± cf\k) ± c®\ if } (fc) - if) ± cy\k) ± c f 


1=0 k =1 


V 

n—b n b n 


E E fc2A 2^n(fc) [(if } (£0 - c, w (fc)) + (cf (&) - c«) - (if - cf 


«=0 fc=l 


yf )(£;) - cf 


+ [cl j \k)~ 


< 


1 O n O n 

E ^k 2 A 2 w n (k) \ C^\k) 


1=0 k =1 

71 bn bn 


C\ j \k ) - 


-Si 


n I ^ij 


+ ^ E E fc2 l A 2^n(fc)| |(iff) - cf f)) (iff) - c® (fc)) 


i=0 fc=l 


+1 

(iff) - f)) (if - c ®) 




+ 

(iff)-(ff)) (c®(k)-cp) 

+ 

(r« - <?») (?® - c») 

+ 

(if - cf) (iff) - cl j) (fc)) 

+ 


» - <J«) (cf (fc) - <J«) 


+ 


cr(k)-cw (Y['(k) — C[ 


+ 


f°(fc) - 


(A.26) 


We will show that each of the nine terms in (A.26) goes to 0 with probability 1 as n —>• oo. To 
do that, let us first establish a useful inequality. From (2.3), for any component i, and sufficiently 
large n, 


E yf-c«(n) 


t=l 




(A.27) 


1. 


71 bn bn 


EE* : 

z=o fc=l 

Notice that 


cff) - 


1 

n 


cl j \k) - 


E E fc2A 2^(£0 (f f) - cp) (cy’ik) - 

1=0 k=l 

is equivalent to the ijth entry in LY w n L 1 . Then by Corollary 3 

71 bn bn 


— E* 


=v(j) / 


-i ' J n '-'n 

~ E Y,k 2 A 2 w n (k) (ff) 


1=0 k =1 


cff) - 


- E. 




0 as n —>• oo w.p. 1. 
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n—b n b„ 


EE k 2 \A 2 w n (k)\ 

n 1=0 k =i 
Note that for any component i 


Y l {i) (k) - 


By (A.28), 




(0 


< 


- c[ j) 


t =i 

fc+i i 

J2 y t (i) - Y t (i) - c(i) ( k +o+ c(i) (o 

t=l i=l 

i+fc l 

J^Y^-C^k + l) + 

t=i 


-c«(o 

t=i 


< Dip(l + k) + Dip(l) by (A.27) 

< 2 Dij)(n) since l + k < n. 


n 


1=0 k =i 
n bn bn 


EE k^WnWlUY^W-Cl 


=»(*) 


1 1 

<;EE |A 2 w„(fc)| (2Dip(n)Y 


= 4 D 


1=0 k =1 

2 I n - b n + 1 
n 


4 >(n) 2 \A 2 w n (k)\ 


k =l 


0 as n —> oo with probability 1. 


i) (i) (£0 - C\ 


O') 


^EE fc 2 |A 2 «;n(fe)| 


y, (i) (fc) - cf } 


1=0 k =1 

Note that for any component i, using (A.27), 


yW _ /7(*) 

- 1 v - / n 


1 


n 


£r t (i) -<?«(„) 


t=i 


< — Dipin ). 
n 


By (A.28) and (A.29), 


1 

n 


77- bn bn 


EE A; 2 ) A 2 u> n (fc)| 


/=o fc=l 


y, w (fc) 



n—bn b 


-i 1 L / -i 

< - ^ ^ fc|A 2 tc n (fc)| (2 Dip(n)) (-Dip(n) 


i=o k =i 



(A.28) 


(A.29) 
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4. Now 


= 2 D^in) 2 ( U — bn + l N ) -'Y' k\A 2 w n (k)\ 
n n 

7 k =l 


< 2 D 2 ijj{n) 


n — b n + 1 \ b. 


n 


n 


Y2\A 2 w n (k)\ 


k= 1 


0 as n —> oo with probability 1. 


1 

n 


On On 

EE fc 2 |A 2 u; n (fc)| | (y z W (fe) - cf^fc)) (C^(k) - <?£>) 


Z=0 fe=l 
n b n b n 


< - 
n 


z=o fc=l 

n—b n b n 


-1 "_ 

-EE fc 2 |A2t«n(A:)| | } (*0 - cf^)) (7 Z ( %) 


H— 
n 


1 _ n / , 
;EE A; 2 |A 2 w„(A:)| | (^ W (fc) - Cf\k)) C® 


1=0 k =1 


We will show that both parts of the sum converge to 0 with probability 1 as n —> oo. Consider 
the first sum. 


1 


n 


EE k 2 \A 2 w n (k)\ |(y z W (A:) - CfOfc)) q W (/c) 


=»(*) 


^0) / 


=0 k= 1 

6„ 


1 


<tEE fc 2 |A 2 «; n (fc)| *?’ l \k) - Cf\k)\ \c\i\k) 

n 1=0 k =1 
n— b„ b n 


< - 
n 


/«- Vn u n / \ 

;EE /c|A 2 u; n (A;)| (2 Di/j(n)) f 2(1 + e)v / &nEjj-(log?r) 1//2 J by (A.2) and (A.28) 


1=0 k =l 

n - b n + 1 
n 


4-0(1 + e)sjYiiib n log nip(n) E |A 2 io n (A;)| 


fc=i 


0 by Condition 2 and (A.24) 


The second part is 

Tl b n b n 


1 

n 


EE fc 2 |A 2 w n (A0| [(^(fc) - Cf\k)) C® 


1=0 k =1 

n—bn b, 


-i ' 1 L, n 

- E E> 2 i a w*oi |^ (0 (*) - cf(*o| ^ 


1=0 k =1 
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n-b„ b„ 


< - 
n 


1=0 k =1 


1 "n / | \ 

- E J2 k \ A2Wn (fc)| (2 Dtj){n)) ( — (1 + e) [2n£,j log log n] 1//2 J by (A.28) and (A.l) 


< 


< 


< 



D{ 1 + e)^(n) — log ^ — ^ k\A 2 w n {k)\ 

n k =1 


n - b n + 1 


n 


D( 1 + e)'i/;(7i 


(logn) 1 / 2 


n 


1/2 


b n \A 2 w n (k)\ 



k =1 

, 1/2 6 „ 

A>(1 + e)i/)(n)(b n log n) 1/2 ^ |A 2 i/; n (&)| 


fe=i 


0 by Condition 2 and (A.24) . 


5. Next, 


6 . 


7. 


n—b n b n 


EE A; 2 |A 2 u; ri (A;)| 

Z=0 fc=l 

-j 72 677 , bn -j 

< EE /c 2 |A 2 w n (A:)|^ J D 2 ?/)( 


< 


Z=0 fe=l 

n - 6 n + 1 
n 


n* 

D2 ^' t l J ( n ) 2 ^2\ A ^ w n(k)\ 


by (A.29) 


fc=i 


0 by Condition 2 and (A.25) 


72 bn b' 


-1 1 " '-'n. 

;ee /c 2 | A 2 u> n (fc)| 


y/ j) (fc) - C) 


CO 


1=0 k =1 
n— b n b n 


< “ E E)fc| A 2 ^n(fc)| ( -£>^(n)) (2D^(n)) 

Z=0 fc=l 

n - b n + 1' 


< 


< 


n 


1 

2£> 2 ?/>(n) 2 - V] fc|A 2 u; n (fc)| 


k =1 

bri 


-— bn + 1 j 2 D 2 ip(ri) 2 — V |A 2 itz„(A:)| 

n J n z —' 

7 Zc=l 


0 by Condition 2 and (A.25) 


72 bn bn 

E E fc2 i A ^wi | (E w - c* 0 ) (q (j) (fc) - 

Z=0 fc=l 
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n h ,i h n 


< - 
n 


1 

H— 
n 


* E E fc2 i A 2^.(fe)i | (y» - c«) c z (j) (fe) 

' 0 k =1 
i b n b n 

EE fe 2 |A 2 w n (£;)| I (V n w - 


«=o fc=l 


We will show that each of the two terms goes to 0 with probability 1 as 

72 bn bn 


E E fe2 i A 2 ^(£ 0 i | (E° - c$) c\ j \k) 

1=0 k=1 

<;EE fc 2 |A 2 Wn(fc)| ^2(1 + e)\/6nEjj(logn) 1 / 2 ^ 

n Z=0 k= 1 \ n / \ ' J 


< r ~ K + * ) 2P(i+ x;ha 2 »„wi 

n fc=i 


< 


n / 
n — b n + 1 


n 


2D{1 + e)v / Eii—\/ & n logn^(n) V] |A 2 w ri (/c)| 


fc=i 


0 by Condition 2 and (A.24) 


For the second term, 

72 b n bn 

y 

n 


i) _ Wi) 

n '-'n / w n 


1 ' ’-'n 

AE /c 2 | A 2 'ic n (fc)| 

z=o fc=l 

n—b n b„ 

E E fc2 l A 2^(fe)| | (e« - c«) | p) 

Z=0 fc=l 

1 n ~b n b n / 1 \ / I \ 

< EE fc 2 |A 2 rc n (fc)| ( -Dil)(n )j (-(1 + e) pnE^ loglogn] 1/2 j 

n z=o fc=i / V n / 


1 

n 


< 


< 


< 


n- b n + 1 


n 



n z z — J 

fc=l 


n - b n + 1 


n 



(D(i + () ys^Mg |A!%(i)| 


n- 


fc=i 


n - b n + 1 


n 



. b n b 


1/2 


17(1. + e) — -^j^(b n logn) 1 ^ 2 V'(n) V I A 2 w n (fe)| 
n n 1 / 2 


fc=i 


0 by Condition 2 and (A.24) . 


n —>• oo. 


by (A.2) and (A.29) 


by (A.29) and (A.l) 
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8 . 

(C[ i} (k) - C«) (Y^(k) - C[ j \k )) . 

1=0 k =1 

This term is the same as term 4 except for a change of components. Thus the same argument 
can be used to show that it converges to 0 with probability 1 as n —> oo. 


- I] J^k 2 \A 2 w n(k)\ 


9. 


1 n — bn bn 

- Y Y k2 \ A2Wn ^\ 

1 1=0 k=l 



This term is the same as term 7 except for a change of components. Thus the same argument 
can be used to show that it converges to 0 w.p. 1 as n —> oo. 


Since each of the nine terms converges to 0 with probability 1, |E,;j — | —> 0 as n —> oo with 
probability 1. □ 


Since we proved that Eg = T, Wjn + d n —> E + 0 as n —»oo with probability 1, we have the desired 
result for Theorem 1. 


A.4 Proof of Theorem 2 


Let S = {St}t>i be a strictly stationary stochastic process on a probability space (P, J 7 , P ) and set 
P l k = a(Sk, • • ■, Si). Define the a-mixing coefficients for n = 1, 2,3,... as 


a(n) = sup sup \P(An B) — P(A)P(B)\ . 

k>i A e^,Be^ +n 

The process S is said to be strongly mixing if a(n ) —> 0 as n —> oo. It is easy to see that Harris 
ergodic Markov chains are strongly mixing; see, for example, Jones (2004). 


Theorem 5. (Kuelbs and Philipp, 1980) Let /(Si),/(S^), • • • be an W-valued stationary process 
such that Ep || f \\ 2+5 < oo for some 0 < 5 < 1. Let ap(n) be the mixing coefficients of the process 
{f(St)}t >i and suppose, as n —>• oo, 

ctf(n) = O for e > 0. 


Then there exists a p-vector Of, a p x p lower triangular matrix Lf, and a finite random variable 
Df, such that, with probability 1, 


J2f(Xt)~ne f -L f B(n ) 

t =l 


< D f n 1/2 ~ x f 


(A.30) 


for some Xf > 0 depending on e, 5, and p only. 
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Corollary 4. Let i^||/|| 2+<5 < oo for some 5 > 0. If X is a polynomially ergodic Markov chain of 
order £ > (1 + e)(l + 2/5) for some e > 0, then (A.30) holds for any initial distribution. 


Proof. Let a be the mixing coefficient for the Markov chain X = {Xt}t >l and «/ be the mixing 
coefficient for the mapped process {f(Xt)}t> i- Then the elementary properties of sigma-algebras 
(cf. Chow and Teicher, 1978, p. 16) shows that a fin) < a(n) for all n. Since X is polynomially 
ergodic of order £ we also have that a(n) < E pMn~^ for all n and hence if £ > (1 + e)(l + 2/ 5), 
then af(n) < E pMn~^ = 0(n _ ( 1_, “ e ^ 1+2 / 5 )). The result follows from Theorem 5 and thus the 
strong invariance principle as stated, holds at stationarity. A standard Markov chain argument 
(see, e.g. Proposition 17.1.6 in Meyn and Tweedie (2009)) shows that if the result holds for any 
initial distribution, then it holds for every initial distribution. □ 


Proof of Theorem 2. Since Ei?||g|| 4+<5 < oo implies Eir||< 5 r|| 2+<5 < oo and X is a polynomially ergodic 
Markov chain of order £ > (1 + e)(l + 2/5) we have from Corollary 4 that an SIP holds such that 


^g{Xt) -nO- LB{n) 

t =l 


< L>n 1/2 “V 


for some X g > 0 depending on e, 6 , and p only. 

Since E^||g|| 4+<5 < oo implies E^ || /z,|1 2 + 5 < oo and A is a polynomially ergodic Markov chain of 
order £ > (1 + e)(l + 2/5) we have from Corollary 4 that an SIP holds such that 


J2 h ( X ^ ~ n °h ~ L h B(n) 

t =l 


< D h n 1/2 ~ Xh . 


for some > 0 depending on e, 5, and p only. 

Setting A = minlA^A/j} shows that (2.3) and (2.4) hold with 


tjj(n) = if hip) = n 1 / 2 X ■ 


The rest now follows easily from Theorem 1. 


□ 
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